-
Notifications
You must be signed in to change notification settings - Fork 50
Fix: Replace non-ASCII and non-printable characters with dots in logger #1218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
@@ -69,6 +69,11 @@ func TestBodyLoggerNotJson(t *testing.T) { | |||
assert.Equal(t, dump, "<html>") | |||
} | |||
|
|||
func TestBodyLoggerNotAscii(t *testing.T) { | |||
dump := bodyLogger{debugTruncateBytes: 20}.redactedDump("", []byte("\x01 🚀 🚀")) | |||
assert.Equal(t, dump, ". . .") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why 🚀 is filtered out? It's printable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it helps to see a lot of dots if request body is a large gzipped file? Compressing them all into single <binary>
would be more useful for logs.
There could also be a case where users actually need to see the response (for debugging). In that case applying lossy transformation makes that impossible and thus worse than having binary in the output (which you could use if you really wanted to).
If decompressing is not possible / out of scope then printing escaped string could be next best option, e.g. we can adopt Python's approach:
>>> open('cli', 'rb').read(100)
b'\xcf\xfa\xed\xfe\x0c\x00\x00\x01\x00\x00\x00\x00\x02\x00\x00\x00\x12\x00\x00\x00\xd8\n\x00\x00\x04\x00 \x00\x00\x00\x00\x00\x19\x00\x00\x00H\x00\x00\x00__PAGEZERO\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
@@ -122,7 +123,15 @@ func (b bodyLogger) redactedDump(prefix string, body []byte) string { | |||
sb.WriteString("\n") | |||
} | |||
line = strings.Trim(line, "\r") | |||
sb.WriteString(fmt.Sprintf("%s%s", prefix, line)) | |||
sb.WriteString(prefix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's maybe out of scope for this PR but I'd rather it decompressed the response (based on content-encoding header) and printed it.
If the reason I use debug log is to see requests & responses then that's what I'd be interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered escaping the string so that the conversion remains lossless?
What changes are proposed in this pull request?
Never log non-ASCII or non-printable characters in log output.
Request logs may include binary data if it isn't JSON. This can affect the terminal state.
Motivated by databricks/cli#2804.
How is this tested?
Unit test that confirms that non-printable ASCII and UTF-8 characters are replaced with a
.
.